336 8.2 Molecular Simulation Methods
there are inevitably bottlenecks due to one parallelized component of the simulation
having to wait for the output from another before it can proceed, though of course
multiple different simulations can be run simultaneously at least, for example, for
simulations that are stochastic in nature and so are replicated independently several
times or trying multiple different starting conditions for deterministic simulations.
3 Supercomputing resources have improved enormously, with dedicated clusters of
ultrafast multiple-core CPUs coupled with locally dedicated ultrahigh bandwidth
networks. The use of academic research supercomputers is in general extended to
several users where supercomputing time is allocated to enable batch calculations in
a block and can be distributed to different computer nodes in the supercomputing
cluster.
4 Recent developments in GPU technology have revolutionized molecular simulations.
Although the primary function of a GPU is to assist with the rendering of graphics
and visual effects so that the CPU of a computer does not have to, a modern GPU
has many features that are attractive to brute number-crunching tasks, including
molecular simulations. In essence, CPUs are designed to be flexible in performing
several different types of tasks, for example, involving communicating with other
systems in a computer, whereas GPUs have more limited scope but can perform
basic numerical calculations very quickly. A programmable GPU contains several
dedicated multiple-core processors well suited to Monte Carlo methods and MD
simulations with a computational power far in excess of a typical CPU. Depending
on the design, a CPU core can execute up to 8× 32 bit instructions per clock cycle
(i.e., 256 bit per clock cycle), whereas a fast GPU used for 3D video-gaming purposes
can execute ~3200× 32 bit instructions per clock, a bandwidth speed difference of a
factor of ~400. A very-high-end CPU of, for example, having ~12 cores, has a higher
clock rate of up to 2–3 GHz versus 0.7–0.8 GHz for GPUs, but even comparing coup
ling together four such 12-core CPUs, a single reasonable gaming GPU is faster by at
least a factor of 5 and, at the time of writing, cheaper by a factor of at least an order
of magnitude. GPUs can now be programmed relatively easily to perform molecular
simulations, outperforming more typical multicore CPUs by a speed factor of ~100.
GPUs have now also been incorporated into supercomputing clusters. For example,
the Blue Waters supercomputer at the University of Urbana-Champaign is, as I write,
the fastest supercomputer on any university campus and indeed one of the fastest
supercomputers in the world, which can use four coupled GPUs that have performed
a VMD calculation of the electrostatic potential for one frame of a MD simulation of
the ribosome (an enormously complex biological machine containing over 100,000
atoms with a large length scale of a few tens of nm; see Chapter 2) in just 529 s using
just one of these available GPUs, as opposed to ~5.2 h using on a single ultrafast
CPU core.
The key advantage with GPUs is that they currently offer better performance per
dollar than several of high-end CPU core applied together in a supercomputer, either
over a distributed computer network or clustered together in the same machine. A GPU
can be installed on an existing computer and may enable larger calculations for less
money than building a cluster of computers. However, several supercomputing clusters
have GPU nodes now. One caveat is that GPUs do not necessarily offer good perform
ance on any arbitrary computational task, and writing code for a GPU can still present
issues with efficient memory use.
One should also be mindful of the size of the computational problem and whether a super
computer is needed at all. Supercomputers should really be used for very large jobs that no
other machine can take on and not be used to make a small job run a bit more quickly. If you
are running a job that does not require several CPU cores, you should really use a smaller
computer; otherwise, you would just be hogging resources that would be better spent on
something else. This idea is the same for all parallel computing, not just for problems in
molecular simulation.